Rapid adaptation of n-gram language models using inter-word correlation for speech recognition
نویسندگان
چکیده
In this paper, we study the fast adaptation problem of n-gram language model under the MAP estimation framework. We have proposed a heuristic method to explore inter-word correlation to accelerate MAP adaptation of n-gram model. According to their correlations, the occurrence of one word can be used to predict all other words in adaptation text. In this way, a large n-gram model can be efficiently adapted with a small amount of adaptation data. The proposed fast adaptation approach is evaluated in a Japanese newspaper corpus. We have observed a significant perplexity reduction even when we have only several hundred adaptation sentences.
منابع مشابه
Rapid Unsupervised Topic Adaptation – a Latent Semantic Approach
In open-domain language exploitation applications, a wide variety of topics with swift topic shifts has to be captured. Consequently, it is crucial to rapidly adapt all language components of a spoken language system. This thesis addresses unsupervised topic adaptation in both monolingual and crosslingual settings. For automatic speech recognition we rapidly adapt a language model on a source l...
متن کاملLooking at alternatives within the framework of n-gram based language modeling for spontaneous speech recognition
This paper presents different methods using a weighted mixture of word and word-class language models in order to perform language model adaptation. A general language model is built from the whole training corpus, then several numbers of clusters are created according to a word co-occurrence measure and finally, word models as well as word-class models are built from each cluster. The general ...
متن کاملUnsupervised Language Model Adapt Transcriptio
Unsupervised adaptation methods have been applied successfully to the acoustic models of speech recognition systems for some time. Relatively little work has been carried out in the area of unsupervised language model adaptation however. The work presented here uses the output of a speech recogniser to adapt the backoff n-gram language model used in the decoding process. We report results for t...
متن کاملEffects of word string language models on noisy broadcast news speech recognition
In this paper, we present the results that our n-gram based word string language model, combined with speaker and noise adaptation of the acoustic model, improves recognition performance of noisy broadcast news speech. The focus was brought into a remedy against recognition errors of short words. The word string language models based on POS and n-gram frequency reduced deletion errors by 17%, i...
متن کاملApproximated and Domain-Adapted LSTM Language Models for First-Pass Decoding in Speech Recognition
Traditionally, short-range Language Models (LMs) like the conventional n-gram models have been used for language model adaptation. Recent work has improved performance for such tasks using adapted long-span models like Recurrent Neural Network LMs (RNNLMs). With the first pass performed using a large background n-gram LM, the adapted RNNLMs are mostly used to rescore lattices or N-best lists, a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000